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Preface 


This document contains the proceedings of the NASA Workshop on Computational 
Structural Mechanics, held at NASA Langley Research Center, November 18-20, 1987. 
The workshop was sponsored jointly by NASA Langley Research Center and NASA 
Lewis Research Center. 

The purpose of the workshop was to allow participants in Langley's and Lewis' 
Computational Structural Mechanics (CSM) research programs to meet and to 
share research objectives and accomplishments. The intent was to encourage a 
cooperative Langley/Lewis CSM program in which Lewis concentrates on engine 
structures applications, Langley concentrates on airframe and space structures 
applications, and all participants share technology of mutual interest. 

The workshop was organized into the following three sessions: 

I Concurrent Processing Methods and Applications 

II Advanced Methods & Testbed/Simulator Development 

III Computational Dynamics 

Session I dealt with parallel processing methods and languages, new computer 
hardware, and software architecture to exploit parallel computers. 

Session II dealt with the Langley CSM Testbed, the Lewis Engine Structures 
Computational Simulator, and Structural Analysis Technology involving finite 
elements, boundary elements, and probabilistic approaches. 

Session III dealt with advanced methods for structural dynamics. 

The use of trade names or names of manufacturers in this publication does not 
constitute an official endorsement of such products or manufacturers, either 
expressed or implied, by the National Aeronautics and Space Administration. 


W. Jefferson Stroud 



With the exception of a few adjustments made 
primarily for the purpose of uniformity, all pa- 
pers have been published as received. 


— Editor 


CONTENTS 


PREFACE 

W. Jefferson Stroud, LaRC 


Part 1 


OVERVIEW OF THE NASA PROGRAM 

IN COMPUTATIONAL STRUCTURAL MECHANICS i 15 , 

Murray Hirschbein, NASA Headquarters 


SESSION I - CONCURRENT PROCESSING METHODS AND APPLICATIONS 


CSM PARALLEL STRUCTURAL METHODS RESEARCH 

Olaf O. Storaasli, LaRC 

COMPUTATIONAL STRUCTURAL METHODS AT NASA LEWIS 

L. J. Kiraly, LeRC 

TRANSPUTER FINITE ELEMENT SOLVER 

Albert Danial, SPARTA 

TRANSPUTER PARALLEL PROCESSING AT NASA LERC 

Graham K. Ellis, ICOMP, LeRC 

INNOVATIVE ARCHITECTURES 

FOR DENSE MULTI-MICROPROCESSOR COMPUTERS 

Robert E. Larson, Expert-EASE 

PARALLEL LINEAR EQUATION SOLVERS 

FOR FINITE ELEMENT COMPUTATIONS 

James M. Ortega, Gene Poole, Courtenay Vaughan, Andrew Cleary, 

Brett Averick, U. of Virginia 

ALGORITHMS AND SOFTWARE FOR SOLVING FINITE ELEMENT EQUATIONS 

ON SERIAL AND PARALLEL ARCHITECTURES 

Alan George, U. of Tennessee 

PARALLEL EIGENVALUE EXTRACTION 

Fred A. Akl, Ohio U. • 

PARALLEL ALGORITHMS AND ARCHITECTURES FOR 

COMPUTATIONAL STRUCTURAL MECHANICS 

Merrell Patrick, Shing Ma, Umesh Mahajain, Duke U. 

THE FORCE: A PORTABLE PARALLEL PROGRAMMING LANGUAGE 

SUPPORTING COMPUTATIONAL STRUCTURAL MECHANICS 

Harry F. Jordan, Muhammad S. Benten, Juergen Brehm, 

Aruna Ramanan, U. of Colorado 


■ 25Sj_ 

• 75 ^ 
1Q 7% 

137 ^ 


2 ° 3 % 

239 -^ 

261 5 /* 


iii 



METHODS FOR DESIGN AND EVALUATION OF PARALLEL Oft 

COMPUTING SYSTEMS (THE PISCES PROJECT) 281 

Terrence W. Pratt, Robert Wise, Mary Jo Haught, U. of Virginia and ICASE, LaRC 

</3 

MULTIPROCESSOR ARCHITECTURE: SYNTHESIS AND EVALUATION ' 299 

Hilda M. Standley, U. of Toledo 

ENVIRONMENTAL CONCEPT FOR ENGINEERING SOFTWARE /y 

ON MIMD COMPUTERS M323 

L. A. Lopez and K. Valirnohamed, U. of Illinois 

HIERARCHIAL PARALLEL COMPUTER ARCHITECTURE 

DEFINED BY COMPUTATIONAL MULTIDISCIPLINARY MECHANICS 355 

Joe Padovan, Doug Gute, Keith Johnson, U. of Akron 

Part 2 * 

SESSION II - ADVANCED METHODS & TESTBED/SIMULATOR DEVELOPMENT 

CSM RESEARCH: TESTBED DEVELOPMENT 387 

Ronnie E. Gillian, LaRC 

CSM TESTBED ARCHITECTURE 419 

Philip Underwood, Lockheed PARL 

COMPUTATIONAL STRUCTURAL MECHANICS 

ENGINE STRUCTURES COMPUTATIONAL SIMULATOR 459 

C. C. Chamis, LeRC 

INTERFACING MODULES FOR INTEGRATING 

DISCIPLINE SPECIFIC STRUCTURAL MECHANICS CODES 487 

Ned M. Endres, LeRC 

CSM RESEARCH: METHODS AND APPLICATION STUDIES 521 

Norman F. Knight, Jr., LaRC 

GENERIC ELEMENT PROCESSOR (APPLICATION TO NONLINEAR ANALYSIS) 571 

Gary Stanley, Lockheed PARL 

ASSESSMENT OF SPAR ELEMENTS AND FORMULATION OF SOME BASIC 2-D AND 

3-D ELEMENTS FOR USE WITH TESTBED GENERIC ELEMENT PROCESSOR 653 

Mohammad A. Aminpour, AS&M, LaRC 

DEVELOPMENT AND VERIFICATION OF LOCAL/GLOBAL 

ANALYSIS TECHNIQUES FOR LAMINATED COMPOSITES 683 

O. Hayden Griffin, Jr., VPI&SU 

CONTROL OF THE ERRORS OF DISCRETIZATION 

AND IDEALIZATION IN FINITE ELEMENT ANALYSIS 733 

Barna A. Szabo, Washington U. 

* Published under separate cover 


iv 


Part 3* 


BOUNDARY ELEMENTS FOR STRUCTURAL ANALYSIS 763 

Ray Wilson, P&W 

DEVELOPMENT OF AN INTEGRATED BEM FOR HOT 

FLUID-STRUCTURE INTERACTION 831 

P. K. Banerjee and G. F. Dargush, SUNY-BufFalo 

PROBABILISTIC STRUCTURAL ANALYSIS METHODS FOR SELECT SPACE 

PROPULSION SYSTEM STRUCTURAL COMPONENTS 865 

T. A. Cruse, SWRI 

PROBABILISTIC FINITE ELEMENTS (PFEM) APPLIED 

TO STRUCTURAL DYNAMICS AND FRACTURE MECHANICS 903 

Wing-Kam Liu, Ted Belytschko, A. Mani, G. Besterfield, Northwestern U. 

3-D INELASTIC ANALYSES FOR COMPUTATIONAL STRUCTURAL MECHANICS 943 

D. A. Hopkins and C. C. Chamis, LeRC 

SPECIALTY FUNCTIONS FOR SINGULARITY MECHANICS PROBLEMS 981 

Nesrin Sarigul, Ohio State U. 

SESSION III - COMPUTATIONAL DYNAMICS 

LARC COMPUTATIONAL STRUCTURAL DYNAMICS OVERVIEW 1013 

J. M. Housner, LaRC 

ALGORITHMS AND SOFTWARE FOR NONLINEAR STRUCTURAL DYNAMICS 1043 

Ted Belytschko, Noreen D. Gilbertsen, Mark O. Neal, Northwestern U. 

CONCURRENT ALGORITHMS FOR TRANSIENT FE ANALYSIS 1067 

M. Ortiz, Brown U. and B. Nour-Orr.id, Lockheed PARL 

COMPUTATIONAL METHODS AND SOFTWARE SYSTEMS 

FOR DYNAMICS AND CONTROL OF LARGE SPACE STRUCTURES 1105 

K. C. Park, C. A. Felippa, Charbel Farhat, J. D. Downer, J. C. Chiou, 

W. K. Belvin, U. of Colorado 

MULTI-GRID FOR STRUCTURES ANALYSIS 1133 

* Albert F. Kascak, LeRC 


* Published under separate cover 



JV89- 29774 


co 

u 







5/-Sf 

f. & 


.£ S2 

CD (9 

_q 4-> 



(/) 

< 



1 



h 

Cl HI 


111 

e? lu 

cr > 

^ co 


o 

>- o 

O -J 

o o 

-J D 

o o 

2 I 

:c h 

U ID 


w u 3 

£ LU 

111 u 

O) <J 

o ^ 

co O 

P ^ 


n 10 

O ^ 

So 

> h 

LU LU 
D ^ 


^ >- 
111 (J 

2 o 

b o 

81 
Q. U 
X LU 
LU |— 


9 £ 
< o 

| ^ 

III 


Q LU 
LU > 

h co 
U Z 
LU LU 

£ h 

5 ? 


< = 

Z ^ 

is 

h 01 
=> 

0. LU 

^ h 

o ? 

u ~ 


2 


RESEARCH AND DEVELOPMENT 


NASA Computational Structural Mechanics 
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VERY ADVANCED COMPUTERS 


Currently, the CSM effort is being actively pursued by two NASA centers. The Langley Research Center is focusing 
on airframe structures and large space structures while the Lewis Research Center is focusing on aeronautical and 
space propulsion structures. Both centers are building on a long history of activity in computational structural 
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Computational Structural Mechanics (LaRC) 
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Computational Structural Mechanics (LeRC) 
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The applications of CSM technology and the areas of fundamental research are strongly influenced by the existing 
and long-range needs of NASA and the general aerospace community. The general approach has been, and will 
continue to be, to develop capability based on long-range needs but to emphasize applications to current relevant 
problem areas. 
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3-D LOCAL MODEL 
NEAR HOLE 


Global/Local Analysis of CSM Focus Problem 
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MORE REALISTIC MODELS - 2000 NODES, 1000 ELEMENTS 
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SSME HPFTP - FIRST STAGE TURBINE BLADE 
SINGLE CRYSTAL BLADE DYNAMICS 
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SIGNIFICANT NASA COMPUTERS AVAILABLE TO CSM 
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Significant NASA Computers Available to CSM 
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CSM PARALLEL STRUCTURAL METHODS RESEARCH 
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time spent for data management and 1/0 by accomplishing these functions in parallel for large 
finite element applications. Finally, an advanced architecture parallel computer design based on 
a chordal ring of Inmos T-800 processors is planned for delivery to CSM in 1989. It should 
contain at least 15 processors each with a 64-bit floating point unit and a peak performance of 
approximately 90 million Whetstones for a total system peak performance of 34 MFLOPS. 
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It is expected that by maintaining the capability to explore methods exploiting a significant 
number of processors as well as implementing on computers exhibiting the maximum speed for 
today, we shall be in a position to have algorithms with the proper characteristics to run most 
efficiently on the fastest scientific computers in the future. 
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FLEX/32 20-PROCESSOR CONFIGURATION 
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Local memory Local busses 

4 MB / processor 
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MULTI-MICROPROCESSOR COMPUTER ARCHITECTURE 

- FAMILY OF LOW-COST SUPERCOMPUTERS - 
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STIFFENED PANEL WITH CIRCULAR CUTOUT 
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PISCES AND FORCE REDUCE 
MATRIX EQUATION SOLUTION TIME 
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Number of processors Number of processors 

Offers portability across FLEX, ENCORE, ALLIANT, HEP, SEQUENT 
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portability to other parallel computers. 




* 


PARALLEL EIGENSOLVERS REDUCE SOLUTION TIME 
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Preconditioners and Refined Code 
Improve Performance of Equation Solvers 

Solve Ku = f 
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PARALLEL EQUATION SOLVERS REDUCE SOLUTION TIME 
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PARALLEL STRUCTURAL METHODS: FUTURE 
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Portability across Parallel Computers 


PARALLEL STRUCTURAL METHODS: FUTURE 
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The final item is to demonstrate portability of typical testbed processors across several parallel 
computers. Work is currently underway aimed at achieving this goal by using the Force 
extensions to concurrent Fortran (ref. 7). 
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7. Jordan, H., Benten, M., Arenstorf, N. and Ramanan, A., "Force User's Manual," Department of 
Electrical and Computer Engineering Technical Report 86-1 -4R, University of Colorado, Boulder, 
CO, October, 1987. 
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Computational Structural Methods Activity - initiated 
2 years ago - to complement on-going computational 
structural analysis methods development. 
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The goal of our work is to exploit modern computing architectures. It is a new activity 
for Lewis. Our initial work has been directed to more fundamental concerns dealing with 
how we might formulate new algorithms to take advantage of parallel computing and how 
these new computers might be applied to data-taking and analysis. Our longer-term goal is 
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Code analysis 
Architecture Evaluation 
Architecture Synthesis 



We have placed strong emphasis on new innovative approaches which we feel will offer 
significant performance advantages in future structures codes. Along with this activity we 
have started to identify methods which may be employed to successfully re-utilize the large 
stock of existing codes that we currently use, because of the tremendous investment that the 
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'EVENTUALLY'- PURCHASE COMMERCIAL SYSTEM 


Our approach starts with fundamentals. There are many interested parties at Lewis 
who have come together to form a lab-wide team. We currently have representatives from 
Structures, Computational Fluid Mechanics, ICOMP, and Computer Services Division. By 
pooling our resources and studying representative problems we hope to develop some common 
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PROPOSED COMMERCIAL SYSTEM 




Planned New Capability 
Systems Under Development 
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- 2D graphics primitives for structural modeling/animation on TRANSPUTER. 

- A general model of parallel processors (as seen by structures codes) using both 
deterministic and statistical factors formulated for algorithm assessment. " 

- Space station power systems control strategies were simulated on the CAPPS. 
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- Assess processing array limits for preconditioned conjugate gradient integration. 

- Compute the aerodynamic coefficients across the surface of an ATP blade in parallel. 

- Use ’PARAPHRASE’ to optimally convert existing FORTRAN codes to OCCAM. 
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GENERAL INFORMATION - 
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TRANSPUTER CHIP CONFIGURATION 
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Pascal, C, FORTRAN and Forth compilers are also available. 



HARDWARE FOR 
SBIR FEASIBILITY STUDY 



TRANSPUTER NETWORK 
ARRANGED IN A HELICAL GRID 
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EXECUTION TIMES FOR 
GLOBAL STIFFNESS MATRIX ASSEMBLY 



NUMBER OF ELEMENTS 




Parallel Assembly of the Global Stiffness 
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EXECUTION TIMES FOR 
GAUSS-JORDAN MATRIX INVERSION 



MATRIX DIMENSION 


Parallel Solution of Linear Equations 
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PERFORMANCE VERSUS PRICE 
FOR SEVERAL COMPUTING SYSTEMS 
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100 1000 10000 
PRICE IN THOUSANDS OF DOLLARS 




Large-Scale, Practical Transputer FE 
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VARIABLES AFFECTING PERFORMANCE 
OF PARALLEL FE SOLVERS 
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ITERATIVE METHODS 
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EXISTING SOLUTION METHODS 
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- APPEARS ACHIEVABLE 

- REQUIRES INTENSIVE RESEARCH AND DEVELOPMENT EFFORT 
-COULD REVOLUTIONIZE FE ANALYSIS 
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processor or in networks to build high 
performance concurrent systems. 
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The transputer link arrangement. 
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wa /il _t KTpgPN«CT 


Each process can be regarded as an 
independent unit of design. It 
communicates with other processes 
along point-to-point channels. 
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A PROGRAM ON A SINGLE TRANSPUTER 
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T414 based development board# plugs into IBM PC slot, 
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Animate structural model vibration response 
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FOR COMPUTATIONAL MECHANICS N PROPULSION 



2-D APPLICATION PRIMITIVES 
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Absolute coordinate commands 

move.abs.2d 

point.abs.2d 
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Graphics Engine Implementation. Transparent to Applications Programs. 



GRAPHICS ENGINE IMPLEMENTATION. TRANSPARENT TO APPLICATIONS PROGRAMS. 
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MULTIPLE PROCESSOR GRAPHICS DISPLAY ENGINE 
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GRAPHICS ENGINE NODE BUFFERS 
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Each node only performs a single graphics primitive 
computation. Frequently requested commands can 
reside on multiple nodes 
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Deadlock Is avoided by using 'staging buffers' on each 
node “to store pending work requests. The input buffer 
controls the number of work requests sent to each 
node so the number of pending work requests is never 
larger than the number of staging buffers 
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plications Include control o-P unbalance forces In 
tatlng machinery and control of transient vibrations 


H 

O 

2 

H 

H 

H 

rt 


CD 

OO 

05 


Sh 

05 

o 

-M 

o 


H3 

- 4-3 

iJ 

<xT 

O 

S 

2 


ct 

S3 

a 

ct 


05 

O 

3 

05 

Jh 

,05 

05 

rt 

05 

-4-> 

S3 

a 

GO 

§ 


136 



tj *w m a 

& 2 

M W 


N89” 29779 


^■39 


• fl © 
fl 0 1-1 o 
O 

GG rA 03 O) 

S«|“ 

J ' CL> *rH 

H O d 

. S? C 

&3 no ^ O 

2 c»S 

| 95 ° 

Q d ^ r 


*8 1 
3 I 

^ a 

bJO O 

« u 


<\g*: 

o u oo 

25 

^a s 

<U w bO 

8 fc\? 

gs 3 

^,*q >* 

s§j 

•*! 'S 

SKfa 


3 3 

jaj \ 


137 



PRESENTATION OVERVIEW 
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CORPORATE OVERVIEW 
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(30 Mhz). 

- On-chip memory controller 



IMS T800 FLOATING POINT TRANSPUTER 
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Primary Application: Computational Structura 
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PARALLEL LINEAR EQUATION SOLVERS 
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GENERAL OBJECTIVES 
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The overall objective of this research is to develop efficient methods for the solution of linear and nonlinear sys- 
tems of equations on parallel and supercomputers, and to apply these methods to the solution of problems in structural 
analysis. Attention has been given so far only to linear equations. 
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Factorization Methods on Flex/32 
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The Choleski method first factors K into LL T , the product of a lower triangular matrix and its transpose. The 
solution of the displacement equations Kx = f is then completed by solving the triangular systems of equations Lz = f, 
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forj=k+l to min(k+p,N) for j=k+l tomin(k+ (IN) 

for i=j to min(k+p,N) cmod(jjk) 
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processor 1 processor! processor p 

Wrapped Interleaved Column Storage 




The first figure shows a basic Choleski factorization code using the so-called kji form (L is computed by columns 
using immediate updates). For simplicity, banded storage with bandwidth 0 is used in this code. To the right of the 
first code is shown the same code in a short-hand version: cdiv(k) forms the first column of L and cmod(j,k) modifies 
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Choleski on the Fi ex/ 
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Jp for problems too large tor single processor mm oo.Tpiteti using 


This table shows running times for the parallel Choleski code for the panel focus problem on the FLEX/32. Tim- 
ings are given for 1, 2, 4, 8 and 16 processors. The corresponding speedups and mflop rates are also given. The 
speed-ups are calculated using a serial code. For comparison, times are also given for the parallel code on a single pro- 
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ORIGINAL PAGE IS 

OF POOR QUALITY 



Parallel Choleski Mast and Panel Results 


This figure plots the speedups for the panel focus problems as well as two different mast problems. The first mast 
problem has a small bandwidth (15) and very poor speedups are obtained. The second mast problem has a bandwidth 
of about 50 and the speed-ups are better. The panel problems have much larger bandwidths, as given in the table, and 
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Conjugate Gradient Iteration 
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Running on Panel Focus Pronin - i 


The second type of method is iterative, the conjugate gradient method with two different preconditioners: SSOR 
polynomial and incomplete Choleski factorization. These methods have been used to solve both the panel and mast 
focus problems on the FLEX/32. The incomplete Choleski codes for the FLEX/32 use the FORCE package developed 
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Preconditioned Conjugate Gradient Code 




: Key parts are Kp K and preconditioning 


A code is given for the preconditioned conjugate gradient method. ( , ) is the inner product of two vectors. The 
first two statements compute the next iterate x k+1 by minimizing the quadratic function x T Kx - 2x T f along the line 
x - ap k . The residual at x k+1 is then computed and there is a test for convergence. This test can be based on 
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Preconditioners 
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SSQR Preconditioned Conjugate Gradient 
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18 S 


This table gives the number of iterations and the running times on the Flex/32 for the conjugate gradient and 
SSOR preconditioned conjugate gradient codes. The mast and panel focus problems are the same as used for the 
Choleski factorization. The convergence criterion used was (r lc+ 1 ,r lc+1 ) < 1CT 6 which gives about four decimal places 
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Speedup PCS FI ex 
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v »»JV3lj A*tL P/'Qsr (q 

OF POOR QUALI7Y 


Number of 


This chart shows the speed-ups of the SSOR polynomial preconditioned conjugate gradient method. These 
speed-ups are a little worse than for the conjugate gradient method but still satisfactory. 
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This chart compares the run times in seconds on the FLEX/32 of the Choleski factorization and preconditioned 
conjugate gradient methods. Times are given for three sizes of the panel focus problem and for 1 and 16 processors 
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Conjugate Gradient for PANEL.648 
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The running times for an iterative method depend critically upon the convergence criterion. This table shows the 
results of varying the parameter e in the convergence test (r k+ 1 ,r k+1 ) < e for the conjugate gradient method. 
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The chart shows what can be achieved by preconditioning. The problem is a three-dimensioned Poisson-type 
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Conjugate Gradient on CRAY-2 
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Conjugate Gradient Convergence: (r k+1 ,r k+1 ) < 10 
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Summary and Conclusions 
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Algorithms and Software for Solving Finite Element 



^ S N89 


29781 




203 



General Objectives 
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Specific Tasks 
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Symmetric Positive Definite Systems 
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SPARSPAK contains state-of-the-art algorithms for dealing with steps 1-4. 


Symmetric Positive Definite Systems 
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SPARSPAK - Design Considerations 
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SPARSPAK - Features 
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SPARSPAK - Structure of the Package 
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SPARSPAK - Installation in the Testbed 
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deteimine whether exploiting the effect of the constraints is worth the effort. The advantage of the current approach used by RSEQ 
is that many constraint sets can be considered with only one pass through RSEQ. 



Detecting Parallelism in Sparse Matrix Computation 
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Detecting Parallelism - elimination trees 



220 



221 



Elimination trees - an example 
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Elimination trees - an example 
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The elimination trees associated with the matrices. 
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Elimination trees ... facts 
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Elimination trees - restructuring 
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An overall strategy 
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Solving Indefinite Sparse Systems 
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The use of dynamic data structures tends to lead to very complicated code, and substantial computational and storage overhead. 



Solving Indefinite Systems - Partial Pivoting 
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Solving Indefinite Systems - an alternative approach 
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Solving Indefinite Systems - an alternative approach 

The basic strategy is to create from the structure of A a data structure which can accommodate all the nonzeros in M and U, 
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. Alan George, Joseph Liu and Esmond Ng, “Communication Results for Parallel Sparse Cholesky Factorization on a Hyper- 
cube”, Parallel Computing, (submitted). 


PARALLEL EIGENVALUE EXTRACTION 
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General i zed Ei qenprobl em 

CK]C4»: = CM] CtM 0^3 

N — degrees of freedom 
Required n eigenpairs, n < N 
£ Kj positive-definite 
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New parallel algorithm for the solution of large scale eigenproblems 
in finite element applications. 


• Work is in progress to implement algorithm on NAS Cray 2 computer 
at Ames. 

• Assumptions 

1 - Linear elastic finite element models 

2 2 2 

2 - n lower order eigenpairs are required, i.e. w. 1 <...u> 

3 - [K] is positive-definite 

4 - [M] is semi-positive definite 
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Doma i t i 


Fin 



te Element: Model Subdivided into m Domains 



Doma. in i 
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• Consider a parallel computer with (m+1) processors (tasks). 

• Designate the first processor as a global processor (task). 

• Designate the remaining m-processors as domain processors (tasks). 

• A finite element model can be divided into a number of domains equal 
to m. 

• A star architecture (or tree) is the first to be investigated. 





= CMH C4>IJ c^u 


1 — Creati on ot K e & M e 

2 — E i gen so 1 u t i on (Modified Subspace) 

3 - Equation Solver- (Fnontal Solution) 
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• Three major steps of large computational requirements: 

1 - Creation of element stiffness and mass matrices. 

2 - Extraction of a set of eigenpairs. 

3 - Solution of a set simultaneous linear equations. 


• The merits of selecting the modified subspace method for step #2 and 
the frontal solution for step #3 above all discussed in the next new 
graphs . 
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Modi f i ed Subspa.ce Method 


CVD * +1 = (CKU 1 CMD 

= CK3 -1 CBD a 

wheme SL = 1 ,2,3 




l/2(1+r~- _. )/oo 


n 


co 


1 — 


CO 


r i 


co 


rM-1 — 


— °^N 





Subspace ->J 







O 




246 



The Modified Subspace method iterates simultaneously for a subset of 
eigenpairs [<t>,oj] of the generalized eigenproblem: 

1 - Let [V]-| be n starting eigenvectors. Experience has shown that random 

numbers can be used here. A number of techniques are available in 
literature for selecting [V] -j . 

2 - Operate on each [V] as follows 

[v ]* +1 = [<]''["][«], = [K]*’[B] t 

where *,= 1,2,3, . . . . 

* 

3 - Modify [V] to increase convergence rate by one third on average 

* V, - \\ 

where: 3=0 for *=1 and *>11 

X/ 

P a = °- 5 U + Vl )/uj n 

r^_^ are the interval points of the 11-th order Labatto 
rule [-1,1] 


Roots of the 11th Order Lobatto Rule (Kopal 1961) 


r l 

-0.9533098466 

r 6 

0.0000000000 

r 2 

-0.8463475646 

r 7 

+0.2492869301 

r 3 

-0.6861884690 

r 8 

+0.4829098210 

r 4 

-0.4829298210 

r 9 

+0.6861884690 

r 5 

-0.2492869301 

r 10 

+0.8463475646 


247 



Subspace 


CK ^ +1 = CKD e Cvat +1 

C M ^ + 1 = CM3 e CV3| +1 

The Auxiliary Eigenprobl em 
CK ]* +1 CQD^t = CMD * +1 CQD^-, Cn] 

Improved Eigenvectors 

CV3l +1 = CV3*®-, CQ3 fc+n 


5 L -+-1 
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4 - Project K and M onto the required subspace. 

5 - Solve the auxiliary eigenproblem to obtain [Q]^ and [q] £+ ^. 

6 - An improved set of eigenvectors [V]^ can be obtained. 

2 

7 - Test for convergence on u) n - Repeat steps 2 to 6 until desired accuracy 

is achieved. 


Note 

1. Step #2 is performed using the frontal solution, concurrently within 
each domain. 

2. Steps 1, 3, 4 and 6 are processed concurrently within each domain. 
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• Rate of convergence of the modified subspace is 33% faster on average 
compared to the classical subspace method. 

• Figure shows typical behavior. 

• Most computations are performed on an element by element basis. 
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Frontal Solution 


1 

2 

3 

4 

5 

6 

7 

8 


Gauss elimination technique. 

Underlying philosophy is based on processing of elements one by one. 

Simultaneous assembly and elimination of variables. 

The optimum frontal width is at most equal to the optimum band width. 

Numbering of nodes has no impact on optimality while numbering of elements 
is important to minimize the frontal width. 

More efficient for solid elements and elements with mid-side nodes. 

It requires a pre-front to determine last appearance of each node. 

It lends itself to parallel solutions. 
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Mul ti -Frontal Solution 


Within each domain 
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For domain i 


[K]m‘ + i = cb] j+ , 

Assembly and elimination gives 

Vd + *dF U *F = B d 
k ff v f = b f 

where IK upper A matrix for domain i 
* 

variables within domain i 

* 

Vp variables along global front of domain i 

& Bp are right-hand sides for domain & global front, respectively 


For global fronts 


m 

K = I K 


FF 


m 

B = I B r 


K Vp = Bp 
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Successful implementation of the new parallel algorithm depends on: 

1 - Maximizing the efficiency of communication links between the global 

task and the domains 

2 - Minimizing sequential computational steps 

3 - Multi-threaded I/O 


Final report will be available in the Summer 1988 



Anti ci pated Benefits 


Parallel eigenvalue extraction 
algorithm to maximize efficiency 
and speed-up of c omp u t a t i o n s . 

A genenal punpose eigenproblem 
solven “Ton finite element anal y s 
in parallel computing envinonmen 
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ft -J- 


op. sys. 
start 

copy data file i copy data file 


GLBFRONT 


CONFRONT 


1. read/check first data card 

2. set-up VEC for data input 

3. data input and check 

4. reset VEC for global fronts 

5. pre-front for global fronts 


1. read/check first data card 

2. set-up VEC for data input 

3. data input and synthesis 

4. reset VEC for domain 

5. pre-front for domain 

5. element K and M matrices 

6. domain assembly/elimination 

7. Kpp to GLBFRONT 


6. global fronts solution 

7. V p to DOMFRONT 


8. domain solution and subspace 

9. K* and to DOMFRONT 

8. subspace solution ~ 

9. Q to DOMFRONT 

10. convergence test 10. Improved eigenvectors V e 
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DUKE’S CSM SUPPORTED ACTIVITY 
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ORIGINAL PAGE 13 
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CURRENT AND FUTURE EFFORTS 
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parallelism, communication annd synchronization mechanisms, ease of learning language, 
readability of program, and testing and debugging support. 
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Dynamic creation of parallel work 
Efficient implementation of primitives 
Language description and manual available 
Shell scripts for compilation, execution 
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Experience shows that higher speeds can be obtained by 
optimization techniques such as loop unrolling. 

Efficiency concerns lead to careful analysis and redesign 
of existing macros, such as the thorough analysis and 
optimization of the barrier by Arenstorf[4]. 
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The identical source for these has been run on: 

- Encore Multimax - Sequent Balance 

- Alliant FX/8 - Cray 2 
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sity of Colorado Center for Space Structures and Controls. 
Dr. Farhat has found the Force useful to write numerous fin- 
ite element codes so that they can be run unchanged on 
several different multiprocessors. 
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A major improvement in the Force as a complete paral- 
lel programming language is support for dynamic generation 
of parallel work. While many scientific codes can be written 
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has not been completely tested, but it runs simple test 
cases correctly and there are no known problems at this 
time. Some substantial codes written by Charbel Farhat 
have been run on the Cray 2 using the system, and that 
part of the Force which he uses seems to be correct. 
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METHODS FOR DESIGN AND EVALUATION 
OF PARALLEL COMPUTING SYSTEMS 
(The PISCES Project) 
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The PISCES project goal is to build a testbed programming environment to support the evaluation of a large range of parallel 
architectures. This environment, named PISCES, is intended to be Fortran based. 
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Result: New parallel algorithms and parallel versions of existing 
sequential codes may be easily evaluated on the FLEX/32 
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The PISCES 2 system provides a rich environment for experimenting with parallel programming concepts. The mai 
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Choose run-time options: tracing, timelimit, etc. 


The PISCES 2 system as implemented on the CSM FLEX/32 consists of several software 
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PISCES 2 Implementation (continued) 
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Num. of Processors 
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Publications and Presentations 
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MULTIPROCESSOR ARCHITECTURE: 
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Performance of a multiprocessor is 
determined by 


the algorithms 

the programming language 

the program 

the language support 
environment and operating 
system 

number of processing elements 

characteristics of the 
processing elements 

interconnection network 

shared memory organization 
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The difficulty in the analysis of multiprocessor performance may be attributed to 
the large number of factors that may affect performance both independently and through 
interactions. Such factors may be roughly divided into software and hardware categories: 
software--the applications algorithm, the nature of the programming language, the 
efficiency of the program, and the language support environment and operating system; 
hardware-the number of processing elements, the capabilities of the processing 
elements, the interconnection network, and the organization of memory. 
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Goals: 


Ignore the algorithm effect 


Remove the 

language/programming effect 


Study only those 
characteristics of the 
structure of the architecture. 


302 


The goal of this study is to remove the influence of the choice of algorithm used 
for a particular application and to remove the effects of the high-level language and the 
efficiency of the program. The study concentrates on only those characteristics of the 
structure of the architecture. The "structure of the architecture" is defined to include 
those parameters that distinguish an architectural design at the diagram level. For 
example, the interconnection network plays an integral part in such a description while 
the capabilities of the individual processing elements, while crucial to the execution of 
the program, are not represented in the diagram. 


303 



Removing the language/programming 
effect: 

Express maximum amount of 
parallelism 

Data Flow Diagrams (operation 
level) 

Data Flow Diagrams (program 
module level) 

Partitioning and mapping of 
data flow diagrams 
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A high-level language notation to express the maximum amount of parallelism is 
required to assist in removing the language/programming effect. The EASY-FLOW 
language, based on the data flow paradigm, offers a mechanism for expressing the data 
dependencies between program modules, down to the level specified by the programmer. 
These data dependencies are obstacles to parallel execution. Modules which are not 
related by data dependencies may be executed in parallel. The execution environment 
must include a mechanism for the partitioning and mapping of the resulting data flow 
diagrams. 
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I 


Study the impact of the memory organization 
and the interconnection network 


A queuing network mathematical model is 
developed for representing the effect of 
expanding separate shared memories into a 
system of memory hierarchies. 


i 
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The two elements of the architectural structure selected for initial study are the 
memory organization and the interconnection network. A queuing network statistical 
model for a multiprocessor with shared memory is expanded to include a hierarchy of 
memory modules at each shared memory cluster. 
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Model is based on an expanded GM1 
(General Model for Memory Interference) 


Performance is measured as the expected 
number of busy memories. 
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The shared memory hierarchy model is based on the General Model for Memory 
Interference (GMI) suggested by Hoogendoom. Each processor cycles between a 
random access to a particular level within a memory cluster and a time interval in which 
internal computation is performed. Requests to the same memory cluster are queued at 
the cluster. Performance is measured by the expected number of busy memories. 
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Processors 


!CN 


Queues 


Memory Clusters 
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In the shared memory cluster multiprocessor model, the processors are connected 
to the memory clusters via a crossbar switch. It is assumed that this switch introduces no 
delay in accessing memory. Requests to memory are queued at each memory cluster. 
Delays in memory access time may be introduced by interference from other processors 
accessing memory levels within this same cluster. 

A Network H.5 (CACI, Inc.) simulation has been developed in order to evaluate 
the analytic model. An eight-processor/eight-memory cluster system is evaluated under a 
variety of access distributions and intervals of computation time between requests to 
memory. The data collected from 63 simulation runs correlates with the results of the 
analytic model at 0.9950, overall. 
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Modeling the effects of the interconnection 
network 

A polynomial surface representation of 
performance is developed in a (k + 1) space. 

Independent variables may be quantitative and/or 
qualitative: 

size 

average degree (per node) 

diameter 

radius 

girth 

node-connectivity 
edge-connectivity 
connection cost 
minimum dominating set size 
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For the analysis of the effect of the interconnection network on performance, a 
polynomial surface representation of performance is developed. Variables thought to 
influence the performance of a network are: size, average degree (per node), diameter, 
radius, girth, node-connectivity, edge-connectivity, connection cost, and minimum 
dominating set size. 
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Performance measures: 


message completion rate 
average message delay 
connection cost 
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Dependent measurements are used to gauge performance. Typical performance 
measures are message completion rate, average message delay, and connection cost. 
Although the nature of the problem is for the different levels of the independent variables 
to determine a very much discrete set of performance points, the problem is viewed as 
being continuous in the performance variable. 
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Optimization: 

Response surface methodology (RSM) 
optimizes a response variable, based on 
some polynomial function of several 
independent variables. 


Gradient vector may indicate direction of 
steepest ascent. 
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A polynomial function of several independent variables is used to estimate the 
performance surface. T his function is estimated through curve fitting techniques. 
Response surface methodology (RSM) optimizes the response (performance) variable, 
working from this estimated polynomial function. In the situation where an optimum is 
not indicated, gradient vector methods may detect the direction of steepest ascent. 
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Mode Connectivity Diameter 







An application of this analysis uses independent variables of node-connectivity 
and network diameter and the performance measure of message completion rate/cost. It 
may be seen from the diagram that better network performance levels occur at the 
"comers" of the graph, for example when both diameter and node-connectivity are high. 
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Network Synthesis 
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The results of this analysis may be used to identify appropriate levels of 
independent variables to indicate optimum or near-optimum performance networks. 
Existing, well-studied, networks; networks that are hybrids of existing networks; or 
completely novel networks may be suggested. 
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Portable Software Environment 



OBJECTIVES AND CONSTRAINTS 


Oh 

f-i 

S3 

6 S i 

’i 


ft .**3 

o 

o 

<v 

bO .S ^ 

ft 

+3 

o 

f-H 

d 

g — E 

6 S L 

C3 O *3 

6 

Oh 

2 

o 

(V 

H 

*G> > 3 

c fl ft 

(-* 

ft 

aJ 

g — g 


B 

4) **"> O 

to 

4> 

£-> 

ft u 

bD 

8 

rt 

•r V 4> 

C 4> 5h 

o <-r 3 

a 

4> 

^ 4? 

4) 

&* 

is* 

s 

o g a 
^ o g 

■a 8 § 

o 5 

-*-3 

4) 

o 


CO ” 

V $ 

g cL 

S a> 

§ )> <y 
c3 G 
O p 

M U -P 


325 



GUIDING PRINCIPLES: 
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) High Level Language Layer (2 levels) 



S3 

<n 

P 

GO bJO 

a .s 

& & 

o « 

£ i 

cu 
0 
§ 
e 

p 

o 


o 

(=1 


<y 

H 



327 


DMAP /'NASTRAN, DVS/ASKA, NICE/SPAR, ICES/STRUDL ...) systems. It is even more 
appropriate for MIMD computers. 



GUIDING PRINCIPLES (CONT. ): 
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MAJOR PROBLEMS TO SOLVE AND AUTOMATE: 
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MAJOR PROBLEMS TO SOLVE 
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METHODS OF SOLUTION 
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PROPOSED SOLUTION: 
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335 


LE OF HIGH LEVEL DATA DEFINITION: 
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EXAMPLE OF DATA DEFINITION 
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PLE OF CONCURRENT TASK EXPRESSION: 
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IMPLEMENTATION CONCEPT: 
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As noted earlier, the high level language layer is most appropriate to meet all of the needs. 
Furthermore, interpretive data base management is much too slow to be practical. Consequently, all 
high level code and DBMS instructions will be compiled into 1L code. 

Data structures will be defined by the programmer using a high level language. The definition 
will be compiled into an internal format. This will allow the data manager to manipulate and exam- 
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RUN TIME SUPPORT SYSTEM: 
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Can be Implemented in Extended FORTRAN Environments 
like PISCES, SCHEDULE, or FORCE. 



RUN TIME SUPPORT SYSTEM 
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Fetch and Interpret 
IL Code 



SYSTEM OPERATION 















SYSTEM OPERATION 
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Get or 

Data Sleep 

Order 
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DBMS/MEMORY MANAGER INTERACTION 
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MEMORY CONFIGURATIONS 
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Owner Table 
Ac Pt 



Use Set 


Wait Set 


/ \ 

/ \ 

/ \ 



X / ^ / 

^ / 


Global 

/ \ Jf\ 

Local 

Memory 

/ \ / '\ 

Memory 

Manager 

6 \ / h 

Manager 


Global Memory Global Memory 
Table 

Ob Pt ^-''*1 I 



Local Memory 
Table 
Ob Pt 


Local Memory 



Memory Management Tables 







Memory managers insure that the desired object is present in memory for a processor when it 
needs it. There are two basic types of memory managers. One manages the memory on the processor 
itself. This system, called a local manager, sees its own memory directly. It controls what is in it, and 
where to put things in it. It needs no permission to look at data etc... The one exception is when its 
memory is used (partially) to store global tables. That will occur in totally distributed systems; in 
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